Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
نویسندگان
چکیده
We present a mean-variance policy iteration (MVPI) framework for risk-averse control in discounted infinite horizon MDP optimizing the variance of per-step reward random variable. MVPI enjoys great flexibility that any evaluation method and risk-neutral can be dropped off shelf, both on- off-policy settings. This reduces gap between is achieved by working on novel augmented directly. propose TD3 as an example instantiating MVPI, which outperforms vanilla many previous methods challenging Mujoco robot simulation tasks under risk-aware performance metric. first to introduce deterministic policies learning into reinforcement learning, are key boost we show domains.
منابع مشابه
Reinforcement Learning Leads to Risk Averse Behavior
Animals and humans often have to choose between options with reward distributions that are initially unknown and can only be learned through experience. Recent experimental and theoretical work has demonstrated that such decision processes can be modeled using computational models of reinforcement learning (Daw et al, 2006; Erev & Barron, 2005; Sutton & Barto, 1998). In these models, agents use...
متن کاملPreference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preference-based” approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numeri...
متن کاملEquilibrium in an ambiguity-averse mean-variance investors market
Keywords: Robust optimization Mean–variance portfolio theory Ellipsoidal uncertainty Equilibrium price system a b s t r a c t In a financial market composed of n risky assets and a riskless asset, where short sales are allowed and mean–variance investors can be ambiguity averse, i.e., diffident about mean return estimates where confidence is represented using ellipsoidal uncertainty sets, we de...
متن کاملActive Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularl...
متن کاملBellman Gradient Iteration for Inverse Reinforcement Learning
This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Qvalue with respect to the reward function. These metho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i12.17302